Overview
Fetches NSE surveillance lists (ASM and GSM) from Google Sheets (primary) with dual fallbacks to Dhan’s Next.js API and web scraping. These lists identify stocks under regulatory surveillance due to price volatility or other concerns.
Source: fetch_surveillance_lists.py
Phase: Phase 2 (Enrichment)
Output: nse_asm_list.json, nse_gsm_list.json
Data Sources
The script implements a 3-tier fallback strategy:
Primary Source: Google Sheets Gviz API
GET https://docs.google.com/spreadsheets/d/1zqhM3geRNW_ZzEx62y0W5U2ZlaXxG-NDn0V8sJk5TQ4/gviz/tq?tqx=out:json&gid={gid}
Google Sheet tab ID:
- ASM List:
290894275
- GSM List:
1525483995
Secondary Source: Dhan Next.js JSON API
GET https://dhan.co/_next/data/{buildId}/{data_key}.json
Dynamically fetched from https://dhan.co/all-indices/ page source
- ASM:
nse-asm-list
- GSM:
nse-gsm-list
Tertiary Source: Web Scraping
Fallback scraping from:
https://dhan.co/nse-asm-list/
https://dhan.co/nse-gsm-list/
Extracts data from <script id="__NEXT_DATA__"> JSON block.
Configuration
lists_config = {
"nse_asm_list.json": {
"gid": "290894275",
"web_url": "https://dhan.co/nse-asm-list/",
"data_key": "nse-asm-list"
},
"nse_gsm_list.json": {
"gid": "1525483995",
"web_url": "https://dhan.co/nse-gsm-list/",
"data_key": "nse-gsm-list"
}
}
Function Signatures
get_build_id()
def get_build_id():
"""
Dynamically fetch the Next.js buildId.
Returns:
str: Build ID string, or None if extraction fails
"""
Extracts buildId from page source using regex:
match = re.search(r'"buildId":"([^"]+)"', response.text)
return match.group(1) if match else None
fetch_surveillance_lists()
def fetch_surveillance_lists():
"""
Main function that fetches both ASM and GSM lists using 3-tier fallback.
Writes nse_asm_list.json and nse_gsm_list.json to current directory.
"""
Output Structure
Stock trading symbol (e.g., “YESBANK”)
ISIN code of the security
Surveillance stage (e.g., “LTASM”, “STASM”, “GSM”)
Example Output (ASM List)
[
{
"Symbol": "YESBANK",
"Name": "Yes Bank Limited",
"ISIN": "INE528G01035",
"Stage": "LTASM"
},
{
"Symbol": "SUZLON",
"Name": "Suzlon Energy Limited",
"ISIN": "INE040H01021",
"Stage": "STASM"
}
]
From Gviz API Response
text = response.text
match = re.search(r'setResponse\((.*)\);', text)
if match:
data = json.loads(match.group(1))
rows = data.get('table', {}).get('rows', [])
for row in rows:
c = row.get('c', [])
if len(c) >= 5:
symbol = c[1].get('v') if c[1] else None
name = c[2].get('v') if c[2] else None
isin = c[3].get('v') if c[3] else None
stage = c[4].get('v') if c[4] else None
From Next.js JSON
def find_list(obj):
if isinstance(obj, list) and len(obj) > 3:
if isinstance(obj[0], dict) and ('sym' in obj[0] or 'Sym' in obj[0]):
return obj
if isinstance(obj, dict):
for v in obj.values():
res = find_list(v)
if res: return res
return None
Dependencies
requests — HTTP client
json — JSON parsing
re — Regex for buildId and Gviz extraction
BeautifulSoup — HTML parsing for fallback scraping
Error Handling
- Graceful fallback: if Gviz fails, tries Next.js JSON; if that fails, scrapes webpage
- Returns empty list on total failure with error message
- 10-second timeout for all HTTP requests
- Skips header row (where
Symbol == "Symbol")
if symbol == "Symbol" or not symbol:
continue
Usage Example
python3 fetch_surveillance_lists.py
Expected Output:
Primary Fetch: Gviz API (Spreadsheet) for nse_asm_list.json...
Successfully saved 142 items via Gviz.
Primary Fetch: Gviz API (Spreadsheet) for nse_gsm_list.json...
Successfully saved 28 items via Gviz.
Integration
This script is part of Phase 2 (Enrichment) in the EDL Pipeline. The output files are consumed by:
add_corporate_events.py — Adds ”★: LTASM / STASM” event markers to stocks
Run via master pipeline:
python3 run_full_pipeline.py